Decision Tree

See the backing repository for Decision Tree here.

Summary

A supervised decision tree. This is a recursive partitioning method where the feature space is continually split into further partitions based on a split criteria. A predicted value is learned for each partition in the “leaf nodes” of the learned tree. This is a light wrapper to the decision trees exposed in scikit-learn. Single decision trees often have weak model performance, but are fast to train and great at identifying associations. Low depth decision trees are easy to interpret, but quickly become complex and unintelligible as the depth of the tree increases.

How it Works

Christoph Molnar’s “Interpretable Machine Learning” e-book [1] has an excellent overview on decision trees that can be found here.

For implementation specific details, scikit-learn’s user guide [2] on decision trees is solid and can be found here.

Code Example

The following code will train an decision tree classifier for the breast cancer dataset. The visualizations provided will be for both global and local explanations.

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from interpret.glassbox import ClassificationTree
from interpret import show

seed = 1
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)

dt = ClassificationTree(random_state=seed)
dt.fit(X_train, y_train)

dt_global = dt.explain_global()
show(dt_global)

dt_local = dt.explain_local(X_test[:5], y_test[:5])
show(dt_local)

Bibliography

1

Christoph Molnar. Interpretable machine learning. Lulu. com, 2020.

2

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. Scikit-learn: machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.

API

ClassificationTree

class interpret.glassbox.ClassificationTree(max_depth=3, feature_names=None, feature_types=None, **kwargs)

Initializes tree with low depth.

Parameters
  • max_depth – Max depth of tree.

  • feature_names – List of feature names.

  • feature_types – List of feature types.

  • **kwargs – Kwargs sent to fit() method of tree.

explain_global(name=None)

Provides global explanation for model.

Parameters

name – User-defined explanation name.

Returns

An explanation object, visualizing feature-value pairs as horizontal bar chart.

explain_local(X, y=None, name=None)

Provides local explanations for provided instances.

Parameters
  • X – Numpy array for X to explain.

  • y – Numpy vector for y to explain.

  • name – User-defined explanation name.

Returns

An explanation object.

fit(X, y)

Fits model to provided instances.

Parameters
  • X – Numpy array for training instances.

  • y – Numpy array as training labels.

Returns

Itself.

predict(X)

Predicts on provided instances.

Parameters

X – Numpy array for instances.

Returns

Predicted class label per instance.

predict_proba(X)

Probability estimates on provided instances.

Parameters

X – Numpy array for instances.

Returns

Probability estimate of instance for each class.

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns

score – Mean accuracy of self.predict(X) wrt. y.

Return type

float

RegressionTree

class interpret.glassbox.RegressionTree(max_depth=3, feature_names=None, feature_types=None, **kwargs)

Initializes tree with low depth.

Parameters
  • max_depth – Max depth of tree.

  • feature_names – List of feature names.

  • feature_types – List of feature types.

  • **kwargs – Kwargs sent to fit() method of tree.

explain_global(name=None)

Provides global explanation for model.

Parameters

name – User-defined explanation name.

Returns

An explanation object, visualizing feature-value pairs as horizontal bar chart.

explain_local(X, y=None, name=None)

Provides local explanations for provided instances.

Parameters
  • X – Numpy array for X to explain.

  • y – Numpy vector for y to explain.

  • name – User-defined explanation name.

Returns

An explanation object.

fit(X, y)

Fits model to provided instances.

Parameters
  • X – Numpy array for training instances.

  • y – Numpy array as training labels.

Returns

Itself.

predict(X)

Predicts on provided instances.

Parameters

X – Numpy array for instances.

Returns

Predicted class label per instance.

score(X, y, sample_weight=None)

Return the coefficient of determination \(R^2\) of the prediction.

The coefficient \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred) ** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns

score\(R^2\) of self.predict(X) wrt. y.

Return type

float

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).